Graph integration of structured, semistructured and unstructured data for data journalism
نویسندگان
چکیده
Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able make sense heterogeneous corpora, even if they lack the ability define deploy custom extract-transform-load workflows, especially dynamically varying sets sources. We describe complete approach integrating dynamic along lines described above: challenges we faced useful, allow their integration scale, solutions proposed these problems. Our implemented within ConnectionLens system; validate it through set experiments.
منابع مشابه
A Unified Approach to Structured, Semistructured and Unstructured Data
At the present time, the way in which we manage data depends on its structural features. In this report we propose a logical model and algebra which represents a step further in the process of bridging the gap between different data modeling approaches. In particular, the focus is on structured and semistructured data. Our model is based on set theory, as in the relational context, and on data ...
متن کاملWarehousing Structured and Unstructured Data for Data
More data, especially unstructured data, is available to users than ever. There is so much data available that it is diicult for users to make use of their data in its raw form. To handle the diversity of data types, we have designed and prototyped a multidatabase/warehouse system. The system has been especially designed to facilitate the interaction of structured and unstructured data. The sys...
متن کاملStructured Queries for Semistructured Probabilistic Data
We present SPOQL, a structured query language for Semistructured Probabilistic Object (SPO) model [4]. The original querylanguage—SPAlgebra [4], has traditional limitations like terse functional notation and unfamiliarity to application programmers. SPOQLalleviates these problems by providing familiar SQL-like declarative syntax. We show that parsing SPOQL queries is a more involving task than ...
متن کاملOzone: Integrating Structured and Semistructured Data
Applications have an increasing need to manage semistructured data (such as data encoded in XML) along with conventional structured data. We extend the structured object database model ODMG and its query language OQL with the ability to handle semistructured data based on the OEM model and Lorel language, and we implement our extensions in a system called Ozone. In our approach, structured data...
متن کاملSemantic Integration of Structured and Unstructured Data in Data Warehousing and Knowledge Management Systems
Nowadays, increasing information in enterprises demands new ways of searching and connecting the existing information systems. This chapter describes an approach for the integration of structured and unstructured data focusing on the application to Data Warehousing (DW) and Knowledge Management (KM). Semantic integration is used to improve the interoperability between two well-known and establi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Systems
سال: 2022
ISSN: ['0306-4379', '1873-6076']
DOI: https://doi.org/10.1016/j.is.2021.101846